Initialising ...
Initialising ...
Initialising ...
Initialising ...
Initialising ...
Initialising ...
Initialising ...
Hasegawa, Yuta
no journal, ,
To realize the large-scale LES simulation for the aerodynamics of complex shape bodies and the local wind analysis of urban areas, multiple GPU computation of the lattice Boltzmann method (LBM) with adaptive mesh refinement has been implemented. In this presentation, we will explain optimization techniques for the developed code such as single GPU optimization, an optimization of MPI communication, and a spacial parallel implementation for intra-node multiple GPU computation on the latest GPU platforms.
Hasegawa, Yuta; Onodera, Naoyuki; Idomura, Yasuhiro
no journal, ,
In the "CityLBM" project at JAEA, a real-time AMR (adaptive mesh refinement)-based urban wind prediction code was developed. Towards the next generation of CityLBM code, ensemble simulations are needed to improve the reliability of the prediction. For this purpose, the memory usage should be shrunk into a single node or 4-16 GPUs per simulation. To reduce the memory usage and accelerate data communication in the AMR code, we tried an intra-node multi-GPU implementation using Unified Memory in CUDA. This approach enables easy parallel GPU implementation, because the access to Unified Memory is automatically managed via HBM2 (self GPU) or NVLink (neighbor GPU). We implemented multi-GPU calculations for a 3D diffusion equation and a lattice Boltzmann equation on uniform mesh, and tested weak/strong scalability and the performance of NVLink.